Stochastic Approximation for Risk-Aware Markov Decision Processes
نویسندگان
چکیده
We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our has two loops. The inner loop computes the risk by solving saddle-point problem. outer performs $Q-$ learning compute an optimal policy. Several widely investigated measures (e.g., conditional value-at-risk, optimized certainty equivalent, and absolute semideviation) are covered our algorithm. Almost sure convergence rate of established. For error tolerance $\epsilon >0$ for $Q$ -value estimation gap $k\in (1/2,\,1]$ , overall is $\Omega ((\ln (1/\delta \epsilon)/\epsilon ^{2})^{1/k}+(\ln (1/\epsilon))^{1/(1-k)})$ with probability at least $1-\delta$ .
منابع مشابه
Approximate Value Iteration for Risk-aware Markov Decision Processes
We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling the risk measure, can be solved using dynamic programming for small to medium sized problems. However, due to the “curse of dimensionality”, MDPs that model real-life problem...
متن کاملA Convex Analytic Approach to Risk-Aware Markov Decision Processes
Abstract. In classical Markov decision process (MDP) theory, we search for a policy that say, minimizes the expected infinite horizon discounted cost. Expectation is of course, a risk neutral measure, which does not su ce in many applications, particularly in finance. We replace the expectation with a general risk functional, and call such models risk-aware MDP models. We consider minimization ...
متن کاملCentral-limit approach to risk-aware Markov decision processes
Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a longenough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that conver...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملMarkov Decision Processes: Discrete Stochastic Dynamic Programming
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Automatic Control
سال: 2021
ISSN: ['0018-9286', '1558-2523', '2334-3303']
DOI: https://doi.org/10.1109/tac.2020.2989702